AITopics | bridge language

Collaborating Authors

bridge language

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LLM as a Broken Telephone: Iterative Generation Distorts Information

Mohamed, Amr, Geng, Mingmeng, Vazirgiannis, Michalis, Shang, Guokan

arXiv.org Artificial IntelligenceFeb-27-2025

As large language models are increasingly responsible for online content, concerns arise about the impact of repeatedly processing their own outputs. Inspired by the "broken telephone" effect in chained human communication, this study investigates whether LLMs similarly distort information through iterative generation. Through translation-based experiments, we find that distortion accumulates over time, influenced by language choice and chain complexity. While degradation is inevitable, it can be mitigated through strategic prompting techniques. These findings contribute to discussions on the long-term effects of AI-mediated information propagation, raising important questions about the reliability of LLM-generated content in iterative workflows.

arxiv preprint arxiv, distortion, iteration, (13 more...)

arXiv.org Artificial Intelligence

2502.20258

Country:

Europe > Ireland (0.05)
Europe > United Kingdom > England (0.05)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Sports > Soccer (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

MultiSlav: Using Cross-Lingual Knowledge Transfer to Combat the Curse of Multilinguality

Kot, Artur, Koszowski, Mikołaj, Chojnowski, Wojciech, Rutkowski, Mieszko, Nowakowski, Artur, Guttmann, Kamil, Pokrywka, Mikołaj

arXiv.org Artificial IntelligenceFeb-20-2025

Does multilingual Neural Machine Translation (NMT) lead to The Curse of the Multlinguality or provides the Cross-lingual Knowledge Transfer within a language family? In this study, we explore multiple approaches for extending the available data-regime in NMT and we prove cross-lingual benefits even in 0-shot translation regime for low-resource languages. With this paper, we provide state-of-the-art open-source NMT models for translating between selected Slavic languages. We released our models on the HuggingFace Hub (https://hf.co/collections/allegro/multislav-6793d6b6419e5963e759a683) under the CC BY 4.0 license. Slavic language family comprises morphologically rich Central and Eastern European languages. Although counting hundreds of millions of native speakers, Slavic Neural Machine Translation is under-studied in our opinion. Recently, most NMT research focuses either on: high-resource languages like English, Spanish, and German - in WMT23 General Translation Task 7 out of 8 task directions are from or to English; massively multilingual models covering multiple language groups; or evaluation techniques.

computational linguistic, proceedings, translation, (14 more...)

arXiv.org Artificial Intelligence

2502.14509

Country:

Europe > Portugal > Lisbon > Lisbon (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(11 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Multilingual k-Nearest-Neighbor Machine Translation

Stap, David, Monz, Christof

arXiv.org Artificial IntelligenceOct-23-2023

k-nearest-neighbor machine translation has demonstrated remarkable improvements in machine translation quality by creating a datastore of cached examples. However, these improvements have been limited to high-resource language pairs, with large datastores, and remain a challenge for low-resource languages. In this paper, we address this issue by combining representations from multiple languages into a single datastore. Our results consistently demonstrate substantial improvements not only in low-resource translation quality (up to +3.6 BLEU), but also for high-resource translation quality (up to +0.5 BLEU). Our experiments show that it is possible to create multilingual datastores that are a quarter of the size, achieving a 5.3x speed improvement, by using linguistic similarities for datastore creation.

computational linguistic, datastore, multilingual datastore, (14 more...)

arXiv.org Artificial Intelligence

2310.14644

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(7 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Gender Lost In Translation: How Bridging The Gap Between Languages Affects Gender Bias in Zero-Shot Multilingual Translation

Cabrera, Lena, Niehues, Jan

arXiv.org Artificial IntelligenceMay-26-2023

Neural machine translation (NMT) models often suffer from gender biases that harm users and society at large. In this work, we explore how bridging the gap between languages for which parallel data is not available affects gender bias in multilingual NMT, specifically for zero-shot directions. We evaluate translation between grammatical gender languages which requires preserving the inherent gender information from the source in the target language. We study the effect of encouraging language-agnostic hidden representations on models' ability to preserve gender and compare pivot-based and zero-shot translation regarding the influence of the bridge language (participating in all language pairs during training) on gender preservation. We find that language-agnostic representations mitigate zero-shot models' masculine bias, and with increased levels of gender inflection in the bridge language, pivoting surpasses zero-shot translation regarding fairer gender preservation for speaker-related gender agreement.

large language model, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

2305.16935

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

V\=arta: A Large-Scale Headline-Generation Dataset for Indic Languages

Aralikatte, Rahul, Cheng, Ziling, Doddapaneni, Sumanth, Cheung, Jackie Chi Kit

arXiv.org Artificial IntelligenceMay-9-2023

We present V\=arta, a large-scale multilingual dataset for headline generation in Indic languages. This dataset includes 41.8 million news articles in 14 different Indic languages (and English), which come from a variety of high-quality sources. To the best of our knowledge, this is the largest collection of curated articles for Indic languages currently available. We use the data collected in a series of experiments to answer important questions related to Indic NLP and multilinguality research in general. We show that the dataset is challenging even for state-of-the-art abstractive models and that they perform only slightly better than extractive baselines. Owing to its size, we also show that the dataset can be used to pretrain strong language models that outperform competitive baselines in both NLU and NLG benchmarks.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.05858

Country:

Africa > Middle East > Djibouti > Arta > `Arta (0.60)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > India > Tamil Nadu > Chennai (0.04)
(9 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Law (1.00)
Information Technology (0.67)
Leisure & Entertainment > Sports (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Introducing the First AI Model That Translates 100 Languages Without Relying on English

#artificialintelligenceNov-17-2021, 09:30:34 GMT

Next, we introduced a new bridge mining strategy, in which we group languages into 14 language groups based on linguistic classification, geography, and cultural similarities. People living in countries with languages of the same family tend to communicate more often and would benefit from high-quality translations. For instance, one group would include languages spoken in India, like Bengali, Hindi, Marathi, Nepali, Tamil, and Urdu. To connect the languages of different groups, we identified a small number of bridge languages, which are usually one to three major languages of each group. In the example above, Hindi, Bengali, and Tamil would be bridge languages for Indo-Aryan languages.

bridge language, parallel data, translate 100, (7 more...)

#artificialintelligence

Country: Asia > India (0.27)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.55)

Add feedback

The first AI model that translates 100 languages without relying on English data

#artificialintelligenceOct-20-2020, 09:55:25 GMT

Facebook AI is introducing, M2M-100 the first multilingual machine translation (MMT) model that translates between any pair of 100 languages without relying on English data. When translating, say, Chinese to French, previous best multilingual models train on Chinese to English and English to French, because English training data is the most widely available. Our model directly trains on Chinese to French data to better preserve meaning. It outperforms English-centric systems by 10 points on the widely used BLEU metric for evaluating machine translations. M2M-100 is trained on a total of 2,200 language directions -- or 10x more than previous best, English-centric multilingual models.

machine translation, multilingual model, translation, (16 more...)

#artificialintelligence

Country: Asia > India (0.05)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

PR + RQ ≈ PQ: Transliteration Mining Using Bridge Language

Khapra, Mitesh M. (Indian Institute of Technology Bombay) | Udupa, Raghavendra (Microsoft Research India) | Kumaran, A. (Microsoft Research India) | Bhattacharyya, Pushpak (Indian Institute of Technology Bombay)

AAAI ConferencesJul-15-2010

We address the problem of mining name transliterations from comparable corpora in languages P and Q in the following resource-poor scenario: Parallel names in PQ are not available for training. Parallel names in PR and RQ are available for training. We propose a novel solution for the problem by computing a common geometric feature space for P,Q and R where name transliterations are mapped to similar vectors. We employ Canonical Correlation Analysis (CCA) to compute the common geometric feature space using only parallel names in PR and RQ and without requiring parallel names in PQ. We test our algorithm on data sets in several languages and show that it gives results comparable to the state-of-the-art transliteration mining algorithms that use parallel names in PQ for training.

artificial intelligence, machine learning, natural language, (17 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

Asia > India > Maharashtra > Mumbai (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > Texas (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback